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CLAIMS 



1. One or more computer readable media having stored thereon a 
plurality of instructions that, when executed by one or more processors, causes the 
one or more processors to perform acts including: 

receiving the audio portion of a sporting event; 

classifying a set of segments of the audio portion as excited speech; 

classifying a set of frame groupings as including baseball hits; 

combining the set of segments and the set of frame groupings to identify 
probabilities for each segment that the segment is an exciting segment; and 

saving an indication of the set of segments and the corresponding 
probabilities as meta data corresponding to the sporting event. 

2. One or more computer readable media as recited in claim 1, wherein 
each segment includes at least ten 0.5-second windows of the audio portion. 

3. One or more computer readable media as recited in claim 1, wherein 
each frame grouping is a grouping of 25 10-millisecond frames of the audio 
portion. 

4. One or more computer readable media as recited in claim 1, wherein 
the classifying a set of segments as excited speech comprises: 

extracting a first set of features from the audio portion; 
identifying, based on the first set of features, a plurality of windows of the 
audio portion that include speech; 
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extracting a second set of features from the audio portion; and 
identifying, based on the second set of features, which of the plurality of 
windows include excited speech. 

5. One or more computer readable media as recited in claim 4, wherein 
extracting the first set of features comprises, for each of a plurality of windows: 

identifying an average waveform amplitude of the audio portion in a first 
frequency band of the window; 

identifying an average waveform amplitude of the audio portion in a second 
frequency band of the window; 

concatenating the identified average waveform amplitudes to generate an 
energy feature of the first set of features; and 

determining, for an MFCC feature of the first set of features, the Mel- 
frequency Cepstral coefficient of the window. 

6. One or more computer readable media as recited in claim 5, wherein 
the identifying a plurality of windows of the audio portion that include speech 
comprises determining that a window includes speech if the energy feature 
exceeds a first threshold and the MFCC feature exceeds a second threshold. 

7. One or more computer readable media as recited in claim 4, wherein 
extracting the second set of features comprises, for each of a plurality of windows: 

identifying, for each of a plurality of frames in the window, an average 
waveform amplitude of the audio portion in a first frequency band; 
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identifying, for each of the plurality of frames in the window, an average 
waveform amplitude of the audio portion in a second frequency band; 

concatenating the identified average waveform amplitudes to generate an 
energy feature; 

extracting, as a pitch feature of each frame, the pitch of each frame; 
identifying a plurality of statistics regarding each window based on the 
energy features and pitch features of the plurality of frames. 

8. One or more computer readable media as recited in claim 7, wherein 
the identifying a plurality of statistics further comprises: 

identifying a maximum energy; 
identifying an average energy; 
identifying an energy dynamic range; 
identifying a maximum pitch; 
identifying an average pitch; and 
identifying a dynamic pitch range. 

9. One or more computer readable media as recited in claim 7, wherein 
the identifying which of the plurality of windows include excited speech 
comprises identifying a posterior probability that the window corresponds to an 
excited speech class and identifying a posterior probability that the window 
corresponds to a non-excited speech class, and classifying the window in the class 
with the higher posterior probability. 
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10. One or more computer readable media as recited in claim 4, further 
comprising instructions that cause the one or more processors to perform acts 
including outputting an excited speech probability for each of the segments that 
include excited speech, the excited speech probability for a segment indicating a 
likelihood that the segment includes excited speech. 

11. One or more computer readable media as recited in claim 1, wherein 
the classifying a set of frame groupings as including baseball hits: 

extracting, for each frame in a multiple-frame grouping, a set of features 
from the audio portion; 

comparing the set of features from the multiple- frame groupings to a set of 
templates; and 

identifying, for each of the multiple-frame groupings, a probability that the 
grouping includes a baseball hit based on how well the grouping matches the set of 
templates. 

12. One or more computer readable media as recited in claim 11, 
wherein the extracting comprises, for each frame: 

identifying an average waveform amplitude of the audio portion in a first 
frequency band of the frame; 

identifying an average waveform amplitude of the audio portion in a second 
frequency band of the frame; 

concatenating the identified average waveform amplitudes to generate a 
first energy feature of the set of features; 
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• * 

identifying an average waveform amplitude of the audio portion in a third 
frequency band of the frame; and 

using, as a second energy feature of the set of features, the average 
waveform amplitude of the third frequency band. 

13. One or more computer readable media as recited in claim 12, further 
comprising instructions that cause the one or more processors to perform acts 
including: 

generating a third energy feature for each frame by normalizing the first 
energy feature based on the first energy feature of the eighth frame of the multiple- 
frame grouping; and 

generating a fourth energy feature for each frame by normalizing the 
second energy feature based on the second energy feature of the eighth frame of 
the multiple-frame grouping. 

14. One or more computer readable media as recited in claim 1, wherein 
the combining comprises generating, for each of the set of segments, a weighted 
sum of a probability that the segment includes excited speech and a probability 
that a frame grouping within the segment includes a baseball hit. 

15. One or more computer readable media as recited in claim 1, wherein 
the combining comprises adjusting a probability that a segment of the set of 
segments includes excited speech based on a probability that the segment includes 
a baseball hit. 
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16. A method comprising: 

receiving a program including both audio and video; 
receiving meta data corresponding to the program; and 
rendering, based on the meta data, portions of the program as a summary of 
the program. 

17. A method as recited in claim 16, wherein the rendering comprises 
displaying the video of the portions and playing the audio of the portions. 

18. A method as recited in claim 16, wherein the meta data comprises a 
probability indicator, for each of a plurality of portions of the program, that 
identifies a probability that the corresponding portion is an exciting portion of the 
program. 

19. A method as recited in claim 18, wherein the rendering comprises 
selecting the plurality of portions that have probability indicators that exceed a 
threshold value, and rendering the selected portions as the summary. 

20. A method as recited in claim 16, wherein the receiving a program 
and the receiving meta data comprise receiving both the program and the meta 
data from the same source. 

21. A method as recited in claim 16, wherein the receiving meta data 
comprises receiving the meta data from a remote source via a network. 
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22. A method as recited in claim 21, wherein the network comprises the 
Internet. 

23. A method as recited in claim 16, wherein the receiving meta data 
comprises receiving meta data generated manually. 

24. A method as recited in claim 16, wherein the receiving meta data 
comprises receiving meta data generated automatically. 

25. A method as recited in claim 16, wherein the meta data comprises a 
plurality of probabilities, each corresponding to a segment of the program, the 
probabilities representing a probabilistic combination of sports-specific events and 
sports-generic events identified in the program. 

26. A method as recited in claim 25, wherein the sports-specific events 
comprise baseball hits, and wherein the sports-generic events comprise excited 
speech. 

27. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
16. 

28. A system comprising: 

a content provider to make programming content available to requesting 
clients; 
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a meta data provider to make meta data, corresponding to the programming 
contend available to the requesting clients, wherein the meta data identifies, for 
each of a plurality of portions of the programming content, an indicator of a 
likelihood that the corresponding portion is an exciting portion of the 
programming content; and 

a plurality of receivers coupled to receive the programming content from 
the content provider and the meta data from the meta data provider. 

29. A system as recited in claim 28, wherein the content provider and 
the meta data provider are the same. 

30. A system as recited in claim 28, wherein the content provider and 
the meta data provider are coupled to the plurality of receivers via the Internet. 

31. A system as recited in claim 28, wherein the plurality of receivers 
are further to render, based on the meta data, portions of the programming content 
as a summary of the programming content. 

32. A system as recited in claim 28, wherein the meta data comprises a 
probability indicator, for each of a plurality of portions of the programming 
content, that identifies a probability that the corresponding portion is an exciting 
portion of the program. 
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33. A system as recited in claim 32, wherein the plurality of receivers 
are further to select, from the plurality of portions, portions that have probability 
indicators that exceed a threshold value, and render the selected portions as the 
summary. 

34. A system as recited in claim 28, wherein the meta data is generated 
manually. 

35. A system as recited in claim 28, wherein the meta data is generated 
automatically. 

36. A system as recited in claim 28, wherein the meta data comprises a 
plurality of probabilities, each corresponding to a segment of the programming 
content, the probabilities representing a probabilistic combination of sports- 
specific events and sports-generic events identified in the programming content. 

37. A system as recited in claim 36, wherein the sports-specific events 
comprise baseball hits, and wherein the sports-generic events comprise excited 
speech. 

38. A method of automatically summarizing a program, the method 
comprising: 

identifying a plurality of sports-generic events from the audio of the 
program; 
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identifying a plurality of sports-specific events from the audio of the 
program; and 

identifying, by combining the sports-generic events and the sports-specific 
events, a set of portions of the program as a summary of the program. 

39. A method as recited in claim 38, further comprising transmitting the 
set of portions to a client computer as the summary of the program. 

40. A method as recited in claim 39, wherein the transmitting comprises 
transmitting the set of portions via the Internet. 

41. A method as recited in claim 38, wherein the sports-specific events 
comprise baseball hits, and wherein the sports-generic events comprise excited 
speech. 

42. A method as recited in claim 38, wherein the program includes both 
an audio portion and a video portion. 

43. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
38. 

44. A method comprising: 

analyzing audio data of a program to identify a first plurality of portions of 
the program each including excited speech; 
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analyzing the audio data to identify a second plurality of portions of the 
program each including a potential baseball hit; and 

combining the first plurality of portions and the second plurality of portions 
to identify a set of segments of the program and a likelihood, for each of the 
segments in the set, that the segment is an exciting part of the program. 

45. A method as recited in claim 44 5 wherein the analyzing audio data to 
identify the first plurality of portions comprises: 

extracting a first set of features from the audio data; 

identifying, based on the first set of features, a plurality of windows of the 

audio data that include speech; 

extracting a second set of features from the audio data; and 

identifying, based on the second set of features, which of the plurality of 

windows include excited speech. 

46. A method as recited in claim 45, wherein extracting the first set of 
features comprises, for each of a plurality of windows: 

identifying an average waveform amplitude of the audio data in a first 
frequency band of the window; 

identifying an average waveform amplitude of the audio data in a second 
frequency band of the window; 

concatenating the identified average waveform amplitudes to generate an 
energy feature of the first set of features; and 

determining, for an MFCC feature of the first set of features, the Mel- 
frequency Cepstral coefficient of the window. 
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47. A method as recited in claim 46, wherein each window comprises 
0.5 seconds. 

48. A method as recited in claim 46, wherein the identifying a plurality 
of windows of the audio data that include speech comprises determining that a 
window includes speech if the energy feature exceeds a first threshold and the 
MFCC feature exceeds a second threshold. 

49. A method as recited in claim 45, wherein extracting the second set 
of features comprises, for each of a plurality of windows: 

identifying, for each of a plurality of frames in the window, an average 
waveform amplitude of the audio data in a first frequency band; 

identifying, for each of the plurality of frames in the window, an average 
waveform amplitude of the audio data in a second frequency band; 

concatenating the identified average waveform amplitudes to generate an 
energy feature; 

extracting, as a pitch feature of each frame, the pitch of each frame; and 
identifying a plurality of statistics regarding each window based on the 
energy features and pitch features of the plurality of frames. 

50. A method as recited in claim 49, wherein the identifying a plurality 
of statistics further comprises: 

identifying a maximum energy; 
identifying an average energy; 
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identifying an energy dynamic range; 
identifying a maximum pitch; 
identifying an average pitch; and 
identifying a dynamic pitch range. 

51. A method as recited in claim 49, wherein the identifying which of 
the plurality of windows include excited speech comprises identifying a posterior 
probability that the window corresponds to an excited speech class and identifying 
a posterior probability that the window corresponds to a non-excited speech class, 
and classifying the window in the class with the higher posterior probability. 

52. A method as recited in claim 51, wherein the posteriori probabilities 
are identified using a parametric machine. 

53. A method as recited in claim 51, wherein the posteriori probabilities 
are identified using a non-parametric machine. 

54. A method as recited in claim 51, wherein the posteriori probabilities 
are identified using a semi-parametric machine. 
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55. A method as recited in claim 45, further comprising outputting an 
excited speech probability corresponding to each of a plurality segments that 
include excited speech, each segment including a plurality of windows, the excited 
speech probability for a segment indicating a likelihood that the segment includes 
excited speech. 

56. A method as recited in claim 55, wherein the excited speech 
probability is determined by averaging the posterior probabilities of each of the 
plurality of windows in the segment, the posterior probability of a window 
identifying the probability that the window includes excited speech. 

57. A method as recited in claim 44, wherein the analyzing audio data to 
identify the second plurality of portions comprises: 

extracting, for each frame in a multiple-frame grouping, a set of features 
from the audio data; 

comparing the set of features from the multiple-frame groupings to a set of 
templates; and 

identifying, for each of the multiple-frame groupings, a probability that the 
grouping includes a baseball hit based on how well the grouping matches the set of 
templates. 

58. A method as recited in claim 57, wherein the extracting comprises, 
for each frame: 

identifying an average waveform amplitude of the audio data in a first 
frequency band of the frame; 



Lee & Hayes, PLLC 



AA 



MSI-4I6US.PA T.APP.DOC 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 




identifying an average waveform amplitude of the audio data in a second 
frequency band of the frame; 

concatenating the identified average waveform amplitudes to generate a 
first energy feature of the set of features; 

identifying an average waveform amplitude of the audio data in a third 
frequency band of the frame; and 

using, as a second energy feature of the set of features, the average 
waveform amplitude of the third frequency band. 

59. A method as recited in claim 58 5 further comprising: 

generating a third energy feature for each frame by normalizing the first 
energy feature based on the first energy feature of the eighth frame of the multiple- 
frame grouping; and 

generating a fourth energy feature for each frame by normalizing the 
second energy feature based on the second energy feature of the eighth frame of 
the multiple-frame grouping. 

60. A method as recited in claim 44, wherein the combining comprises 
generating, for each of the first plurality of portions, a weighted sum of a 
probability that the portion includes excited speech and a probability that the 
portion includes a baseball hit. 
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61. A method as recited in claim 44, wherein the combining comprises 
adjusting the probability that a portion includes excited speech based on the 
probability that the portion includes a baseball hit. 

62. A system comprising: 

a feature extractor to extract a plurality of audio features from programming 
content; 

an excited speech classification subsystem to identify, based on a sub-set of 
the audio features, a set of segments of the programming content and 
corresponding probabilities that the segments include excited speech; 

a baseball hit detection subsystem to identify, based on another sub-set of 
the audio features, a set of frame groupings of the programming content and 
corresponding probabilities that the frame groupings include baseball hits; and 

a probabilistic fusion subsystem to combine the probabilities that the 
segments include excited speech and the probabilities that the frame groupings 
include baseball hits, and to generate a probability that portions of the 
programming content are exciting based on the combination. 

63. A system as recited in claim 62, wherein the excited speech 
classification subsystem is to identify the set of segments by: 

identifying, based on a first set of the plurality of audio features, a plurality 
of windows of the programming content that include speech; 

identifying, based on a second set of the plurality of audio features, which 
of the plurality of windows include excited speech. 
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64. A system as recited in claim 62, wherein the baseball hit detection 
subsystem is to identify the set of frame groupings by: 

combining, for each of a plurality of multiple-frame groupings, a set of 
features from the programming content; 

comparing the sets of features from the multiple-frame groupings to a set of 
templates; and 

identifying, for each of the multiple-frame groupings, a probability that the 
grouping includes a baseball hit based on how well the grouping matches the set of 
templates. 

65. A system comprising: 

a receiving device to receive a sporting event; 

a user interface to receive, from a user, an indication of a desired summary 
level for the sporting event; and 

a processing subsystem to identify which portions of the sporting event to 
present to the user based at least in part on both the desired summary level and 
meta data corresponding to the sporting event, the meta data identifying a 
likelihood of each of a plurality of portions of the sporting event being exciting 
based at least in part on the presence of both excited speech and ball hits within 
the sporting event. 
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